Whole‐exome sequencing of a Saudi epilepsy cohort reveals association signals in known and potentially novel loci

Background Epilepsy, a serious chronic neurological condition effecting up to 100 million people globally, has clear genetic underpinnings including common and rare variants. In Saudi Arabia, the prevalence of epilepsy is high and caused mainly by perinatal and genetic factors. No whole-exome sequencing (WES) studies have been performed to date in Saudi Arabian epilepsy cohorts. This offers a unique opportunity for the discovery of rare genetic variants impacting this disease as there is a high rate of consanguinity among large tribal pedigrees. Results We performed WES on 144 individuals diagnosed with epilepsy, to interrogate known epilepsy-related genes for known and functional novel variants. We also used an American College of Medical Genetics (ACMG) guideline-based variant prioritization approach in an attempt to discover putative causative variants. We identified 32 potentially causative pathogenic variants across 30 different genes in 44/144 (30%) of these Saudi epilepsy individuals. We also identified 232 variants of unknown significance (VUS) across 101 different genes in 133/144 (92%) subjects. Strong enrichment of variants of likely pathogenicity was observed in previously described epilepsy-associated loci, and a number of putative pathogenic variants in novel loci are also observed. Conclusion Several putative pathogenic variants in known epilepsy-related loci were identified for the first time in our population, in addition to several potential new loci which may be prioritized for further investigation. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-022-00444-6.


Background
Epilepsy is a group of chronic neurological diseases characterized by unprovoked seizures due to abnormal neuronal firing in the brain. It is one of the most frequently encountered medical problems in neurology clinics. Epilepsy affects up to 1 in 26 individuals in the USA and approximately 5-7 per 1000 individuals worldwide with reports indicating that at least 50 to 100 million people in the world suffer from this disease, with approximately 85% of these cases living in developing countries [1][2][3]. However, 20-30% of all epilepsy cases are due to acquired factors, such as infection, stroke, trauma, neoplasms, and autoimmunity, while the remaining cases are thought to be due to genetic factors [4]. Epidemiological studies have observed an increased risk of epilepsy development in the relatives of epileptic individuals [5].
Genome-wide association studies (GWAS) of epilepsy phenotypes using genome-wide genotyping (GWG) arrays have discovered many single-nucleotide polymorphisms (SNPs) associated with the disease. A meta-analysis conducted by the International League Against Epilepsy in 2014 identified SNPs associated with the development of epilepsy in the sodium voltagegated channel alpha subunit 1 (SCN1A), Protocadherin 7 (PCDH7), FA Complementation Group L (FANCL) and Vaccinia Related Kinase 2 (VRK2) genes [6]. A subsequent large GWAS encompassing over 15,200 epilepsy subjects and 29,600 controls revealed 16 genome-wide significant signals reaching statistical significance. Genes in these loci have diverse roles in histone modification, ion-channels, pyridoxine metabolism, synaptic transmission as well as transcription factors. Interestingly, functional annotation of almost 500 SNPs, that were observed to be significant at a genome-wide statistical threshold, found that most are in intronic (46%) or intergenic (29%) regions, with only 4 non-synonymous SNP associations observed, of which 2 were missense variants [7]. Furthermore, approximately half of the SNPs were implicated to impact gene transcription from gene regulation database analyses [7].
Second-generation sequencing technologies including whole-exome sequencing (WES) have been used in the diagnostic or research settings to identify genetic mutation(s) which may cause highly variable phenotypic expression in epileptic subjects [8][9][10]. The very nature of the WES approach favors detection of rarer highly penetrant gene-coding variants which are often hard to find in GWG-based GWAS approaches.
Performing WES in Saudi Arabian epilepsy populations offer a unique opportunity for the discovery of rare genetic variants impacting this disease as there is a high rate of consanguinity among large tribal pedigrees. A study in the Eastern province of Saudi Arabia has reported the prevalence of epilepsy to be around 6-7 per 1000 individuals [11]. In the present study, WES was performed on 144 individuals diagnosed with epilepsy with varying age-of-onset. An American College of Medical Genetics (ACMG) guideline-based variant prioritization approach followed WES which allowed for the discovery of potentially causative variants in our cohort of epilepsy subjects.

Patient sampling and ethical approval
Over a period spanning 2018-2020, samples and data from consecutive subjects with epilepsy attending the Neurosurgery Clinics, King Fahd Hospital of the University, Al-Khobar, and King Fahd Hospital, Alhafof, Saudi Arabia, were collected for inclusion in this study. Participants ranged in age from 13-51 and were clinically diagnosed with epilepsy at the point of recruitment. The phenotype data of all subjects were reviewed by a consultant committee to verify uniformity among sites and eligibility consistent with International League Against Epilepsy [12]. In this study, the diagnosis of epilepsy was made by a consultant specializing in epilepsy based on the patients' clinical history. Moreover, cases with moderate-to-severe intellectual disability, or cancer, were excluded from the study. If there was doubt regarding a patient's phenotypic eligibility, the individual cases were reviewed by the consultant committee and if needed, additional data were requested prior to a decision being made by the committee regarding inclusion of the patient. However, on completion of the project the medical records of all patients were reviewed again. Among all our epileptic patients included in the study, eight patients also had type 2 diabetes, five patients had hypertension, one patient had cardiovascular disease and stroke, one patient had polycystic ovarian syndrome, and two patients had muscular dystrophy. Table 1 outlines the subjects' demographic and clinical characteristics.
Ethical approval for the study was obtained from the local Institutional Review Board (IRB) committees (IRB-2015-01-063), and the study was conducted according to the ethical principles of the Declaration of Helsinki and Good Clinical Practice guidelines. All patients included in the study signed a written informed consent.

DNA sequencing, read alignment, variant calling and quality control
Blood samples were collected from subjects in EDTA vacutainers and after collection were immediately stored at − 80 °C. Standard DNA preparation was performed using DNeasy Blood kits (Qiagen, MD, USA). Wholeexome sequencing libraries were generated using the Agilent SureSelect Human All Exon Kit V5 (Agilent, CA, USA) and sequenced on a HiSeq 2500 instrument (Illumina, CA, USA) using standard paired-end sequencing protocol. Raw sequencing reads were stored as FASTQ files and then aligned to the human reference genome (GRCh37) using Illumina's Dynamic Read Analysis for GENomics (DRAGEN) Pipeline. Resultant BAM files were position-sorted and duplicate reads marked. Singlesample gVCF files were generated by the DRAGEN Germline Pipeline, and joint calling of all samples in the study cohort were performed by DRAGEN Joint Genotyping.

Principal components analysis (PCA) and Kinship
KING was used for relatedness inference based on the genotype of exome SNPs (MAF > 0.01) [13]. Estimated kinship coefficient and number of SNPs with zero shared alleles (IBS0) between a pair of individuals were plotted. Parent-offspring, sibling pairs, and unrelated pairs can be distinguished as separate clusters on the scatterplot. Ancestry and kinship toolkit (AKT) was used to calculate PCAs and plot the results [14].

Variant annotation, filtering and prioritization
Variants were annotated with SnpEff to predict the effects of variants [15]. Rare variants were defined as minor allele frequency (MAF) < 1% in the Genome Aggregation Database (GnomAD) [16]. Intronic, synonymous, 3' and 5' UTR, up-and downstream variants were identified and excluded from the analysis. The remaining rare variants were considered to be potentially deleterious variants. Genetic variants classified in ClinVar as "Likely pathogenic" or "Pathogenic", and in Human Gene Mutation Database (HGMD) as disease-causing mutations (DM) for epilepsy or seizures were collected and curated together with research literatures to server as the knowledgebase for variant prioritization and classification [17,18].

Principal component analysis
The common genetic variants of these Saudi epilepsy individuals show a unique cluster when compared to the world's major populations based on principal component analysis (Fig. 1). The samples demonstrated a genetically matched background which avoided false attribution of associations due to population stratification.

Potentially pathogenic rare variants identified in 44 epilepsy subjects
Based on a combination of variant filtering, and integration of variant databases in ClinVar, and HGMD, we identified a total of 32 potentially causative pathogenic variants across 30 different genes in 44/144 (30%) epilepsy subjects as shown in Table 2. Additional file 1: Supplementary Table S1 outlines additional minor allele frequencies in additional databases and further annotation of these putative pathogenic variants. The 30 genes harboring the likely pathogenic variants observed in these 44 Saudi epilepsy subjects were then assessed for overlap with 102 previously collated monogenic epilepsy genes [7]. Likely pathogenic variants in 12 of these 102 monogenic epilepsy genes were observed in 44 epileptic subjects: CHRNA4, CLN3, CLN8, DEPDC5, KCNJ10, KCNMA1, POLG, PRICKLE1, SCN1A, SCN2A, SCN8A and SCN9A. Of the 18 additional genes from Table 2 with likely pathogenic variants, a number including SZT2, SCN10A, UBA5 have been reported in wholeexome sequencing in epilepsy subjects [19,20]. Only one homozygous mutation, a stop-gain, was observed in single individual [21], in Potassium Calcium-Activated Channel Subfamily M Alpha 1 (KCNMA), a calciumsensitive potassium channel gene which has been shown to have a role in general and early-onset epilepsy-related phenotypes [22][23][24][25][26]. This individual, a female who was diagnosed with childhood epilepsy at 12 years old, has one family member with a diagnosis of epilepsy and was assessed to have first degree of consanguinity.

Most commonly observed likely pathogenic variants
An in-frame insertion mutation in non-imprinted in Prader-Willi/Angelman syndrome region protein 2 (NIPA2) (chr15, GRCh37 position: 23006299) was observed in seven of the study subjects (shown in Table 3 along with age of onset). A missense variant in SH2B Adaptor Protein 3 (SH2B3) (chr12, position:

Variants of unknown significance identified in epilepsy 133 subjects
In highly curated genomic disease databases such as Clin-Var, there are a large number of variants of unknown significance (VUS), where there is unknown or conflicting clinical significance to date for association of such variants with epilepsy-related phenotypes. We identified 232 variants of unknown significance across 101 different genes in 133/144 (92%) subjects as shown in Additional file 2: Supplementary Table S2. Interestingly when the genes harboring these variants of unknown significance were intersected with the 102 previously collated monogenic epilepsy genes it was observed that 43 of these monogenic gene variants were enriched in the 101 loci with VUS from these 133 Saudi epilepsy subjects (Additional file 2: Table S2). We note that two individuals were observed to have homozygous potential pathogenic variants. One individual showed homozygosity for a missense variant in Spermatogenesis Associated 5 (SPATA5), with 3 individuals showing heterozygosity for this mutation (Additional file 2: Supplementary Table S2). SPATA5 has shown clear association with severe childhood epilepsy [27][28][29][30][31][32]. This female individual was diagnoses with childhood epilepsy at the age of 10 years old and two family members having a diagnosis of epilepsy, and both parents are listed as cousins.
Another homozygous mutation was observed in one individual for a missense mutation in Calcium Voltage-Gated Channel Subunit Alpha1 H (CACNA1H) with three additional individuals also carrying one copy of this mutation. Mutations in this gene have been associated with generalized and severe epilepsies [33,34], although it has been debated whether this is a bona fide monogenic epilepsy gene [35]. This female individual was diagnosed with epilepsy at 15 years of age and does not have any other family members with a diagnosis of epilepsy, or any consanguinity noted.

Discussion
We performed the first whole-exome sequencing study in Saudi Arabia epilepsy subjects. Using 144 individuals, we compared putative pathogenic variants as well as variants of unknown significance with population-based whole-exome sequencing and whole genome sequencing databases. The highest number of observed mutations across the 144 subjects were observed in NIPA2, a highly selective magnesium transporter. This in-frame insertion variant (NP 001171818 .1 p. (Asn334 Glu335insAsp )) was observed in 7 subjects from 144 overall (5%). This variant has been previously reported within a population of subjects with childhood absence epilepsy (CAE) [36][37][38].
A missense variant in CHRNA4 was observed in 4 subjects. CHRNA4 is a nicotinic acetylcholine receptor, belonging to a superfamily of ligand-gated ion channels which play an established role in signal transmission at synapses. Mutations in CHRNA4 have been reported with nocturnal frontal lobe epilepsy type 1. A missense variant in SH2B3 was observed in 4 subjects. This gene is involved in a range of signaling activities by growth factor and cytokine receptors as part of the SH2B adaptor family of proteins. Mutations in this gene have been associated with susceptibility to celiac disease type 13 and susceptibility to insulin-dependent diabetes mellitus. It has low expression in the brain however as evident in the Genotype-Tissue Expression (GTEx) database. Missense mutations in STIL were observed in 3 subjects. STIL is a cytoplasmic protein which plays a role in the regulation of the mitotic checkpoint machinery. It too has low expression levels in all GTEx brain tissues.
There are a number of prioritized signals observed in this Saudi epilepsy study that may be novel or have very limited reports of association. Deficiency of WARS2 was observed in a patient with severe infantile-onset leukoencephalopathy, profound intellectual disability, spastic quadriplegia, epilepsy and microcephaly [39]. Rare mutations in DPYD have been implicated in children with unspecific neurological symptoms [40]. Epileptic encephalopathy caused by recessive loss-of-function (LoF) mutations have been reported in DENND5A [41]. A previous report of infantile cerebral and cerebellar atrophy showed association with a mutation in MED17 [42]. A LoF mutation in HCN4 has been reported to be associated with Familial benign myoclonic epilepsy in infancy [43]. Mutations in STRADA, SYNJ1, CACNA1A and NPRL3 have also been reported with severe epilepsyrelated disease, but the association has not been reported for heterozygous variants in isolated forms of epilepsy [44][45][46][47][48][49][50][51].
A number of prioritized signals of putative pathogenicity were observed that have no reports of association with epilepsy in the literature including SEC24D, PCCA, MYO5A which may be strong candidates for further functional studies. SEC24D was reported to play a role in in vesicle trafficking and mutations in this gene are associated with Cole-Carpenter syndrome, a disorder affecting bone formation [52]. PCCA codes for the alpha subunit of the mitochondrial enzyme Propionyl-CoA carboxylase, and mutations in this gene leads to enzyme deficiency and are associated with propionic acidemia [53]. MYO5A encodes myosin 5A, and mutations in this gene are associated with Griscelli syndrome, which is characterized by hypopigmentation and a primary neurological abnormality [54]. These aforementioned genes may be good candidates for further functional studies.
This study is limited in that incomplete pedigrees and depth of sub-phenotyping are available, although putative pathogenic variants many known epilepsy-related loci are evident, and a number of potential new loci may be prioritized for further investigation. The Saudi population offers a lot of promise for elucidation of the genetic etiology of common diseases such as epilepsy due to consanguinity, with extended homozygosity stretches often observed over several megabases, affording the opportunity to enrich for recessive forms of epilepsy. While this study looked at 144 subjects, we are aiming to expand the cohort to encompass genetic analyses of extended pedigrees with additional phenotyping.

Conclusions
We identified for the first time 32 potentially causative pathogenic variants in Saudi individuals with epilepsy. In addition, several potential new loci have been identified that have no reports of association with epilepsy in other populations. These potentially causative pathogenic variants in these new loci may be prioritized for further functional studies.